Deloitte | Data Engineer (Databricks/PySpark Developer) | 5 YOE



Round 1: Technical Interview

1. SQL Query:

🔹Problem: Find the names of managers who have at least 7 employees directly reporting to them.

Sample Query: SELECT emp_name FROM employees WHERE emp_id IN ( SELECT manager_id

  FROM employees GROUP BY manager_id HAVING COUNT(emp_id) >= 7);

🔹Problem: Fetch the rows with the highest scores for each student in a year.

Input:

name year scores

xyz 2018 560

abc 2020 700

def 2016 400

xyz 2019 580

abc 2018 800

def 2017 500

Output:

xyz 2019 580

def 2017 500

🔹LAG Function to Find Previous Year's Scores:

Input:

xyz 2018 560

xyz 2019 580

abc 2018 800

abc 2020 700

def 2016 400

def 2017 500

🔹Problem: Aggregate surface areas and calculate cumulative surface area.

Sample Query: SELECT continent, surface_area, surface_area + COALESCE(LAG(surface_area) OVER (PARTITION BY continent ORDER BY surface_area), 0) AS CSA FROM surface_area;

🔹Azure Data Factory Scenarios: Key Concepts: GetMetadata, ForEach, Copy Data.

Round 2: Managerial Round

🔹Normal self-introduction and career background.

🔹Discussion on previous projects with a focus on Spark optimization and Azure services.

🔹Specific to Spark performance tuning, optimization techniques, and experience with Azure.

Round 3: Director Interview (Data & AI Unit)

🔹In-depth questions on previous data engineering projects.

🔹SQL Queries Focused on GROUP BY scenarios and aggregate functions.

🔹Scenario-based questions on optimizing workflows and data pipelines in Databricks.